Dammam
Large Language Models Hallucination: A Comprehensive Survey
Alansari, Aisha, Luqman, Hamzah
Large language models (LLMs) have transformed natural language processing, achieving remarkable performance across diverse tasks. However, their impressive fluency often comes at the cost of producing false or fabricated information, a phenomenon known as hallucination. Hallucination refers to the generation of content by an LLM that is fluent and syntactically correct but factually inaccurate or unsupported by external evidence. Hallucinations undermine the reliability and trustworthiness of LLMs, especially in domains requiring factual accuracy. This survey provides a comprehensive review of research on hallucination in LLMs, with a focus on causes, detection, and mitigation. We first present a taxonomy of hallucination types and analyze their root causes across the entire LLM development lifecycle, from data collection and architecture design to inference. We further examine how hallucinations emerge in key natural language generation tasks. Building on this foundation, we introduce a structured taxonomy of detection approaches and another taxonomy of mitigation strategies. We also analyze the strengths and limitations of current detection and mitigation approaches and review existing evaluation benchmarks and metrics used to quantify LLMs hallucinations. Finally, we outline key open challenges and promising directions for future research, providing a foundation for the development of more truthful and trustworthy LLMs.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Austria > Vienna (0.14)
- Asia > Middle East > Saudi Arabia > Eastern Province > Dhahran (0.04)
- (10 more...)
- Overview (1.00)
- Research Report > New Finding (0.92)
- Health & Medicine (0.67)
- Education (0.67)
- Law (0.46)
- Energy (0.46)
Digital Twins: Initiatives, Technologies, and Use Cases in the Arab World
Membership in ACM includes a subscription to Communications of the ACM (CACM), the computing industry's most trusted source for staying connected to the world of advanced computing. Digital twins (DTs) are virtual replicas of components, assets, systems, or processes, linked to their real-world counterparts, continuously updating their states and simulating their behavior in real-time, as illustrated in Figure 1 . They are adopted for monitoring, predicting, and optimizing the performance of diverse systems, bridging the gap between design, testing and deployment. Significant efforts are being devoted across Arab R&D institutions to export technology tackling challenges that are not only pertinent to the region, but also of global importance, e.g., energy, sustainability, disaster management, healthcare, and urbanization, among many others. For instance, Khalifa University, UAE, is pioneering research into optical wireless communication using DTs.
- Asia > Middle East > UAE (0.24)
- Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.05)
- Asia > Middle East > Saudi Arabia > Mecca Province > Thuwal (0.05)
- (6 more...)
- Health & Medicine (1.00)
- Information Technology > Security & Privacy (0.69)
- Energy > Power Industry (0.69)
- (2 more...)
A Comprehensive Evaluation of the Sensitivity of Density-Ratio Estimation Based Fairness Measurement in Regression
Almajed, Abdalwahab, Tabar, Maryam, Najafirad, Peyman
The prevalence of algorithmic bias in Machine Learning (ML)-driven approaches has inspired growing research on measuring and mitigating bias in the ML domain. Accordingly, prior research studied how to measure fairness in regression which is a complex problem. In particular, recent research proposed to formulate it as a density-ratio estimation problem and relied on a Logistic Regression-driven probabilistic classifier-based approach to solve it. However, there are several other methods to estimate a density ratio, and to the best of our knowledge, prior work did not study the sensitivity of such fairness measurement methods to the choice of underlying density ratio estimation algorithm. To fill this gap, this paper develops a set of fairness measurement methods with various density-ratio estimation cores and thoroughly investigates how different cores would affect the achieved level of fairness. Our experimental results show that the choice of density-ratio estimation core could significantly affect the outcome of fairness measurement method, and even, generate inconsistent results with respect to the relative fairness of various algorithms. These observations suggest major issues with density-ratio estimation based fairness measurement in regression and a need for further research to enhance their reliability.
- North America > United States > Texas > Bexar County > San Antonio (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Saudi Arabia > Eastern Province > Dammam (0.04)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
EHSAN: Leveraging ChatGPT in a Hybrid Framework for Arabic Aspect-Based Sentiment Analysis in Healthcare
Alamoudi, Eman, Solaiman, Ellis
Arabic - language patient feedback remains under - analysed because dialect diversity and scarce aspect - level sentiment labels hinder automated assessment. To address this gap, we introduce EHSAN, a data - centric hybrid pipeline that merges ChatGPT pseudo - label ling with targeted human review to build the first explainable Arabic aspect - based sentiment dataset for healthcare. Each sentence is annotated with an aspect and sentiment label (positive, negative, or neutral), forming a pioneering Arabic dataset aligned with healthcare themes, with ChatGPT - generated rationales provided for each label to enhance transparency. To evaluate the impact of annotation quality on model performance, we created three versions of the training data: a fully supervised set with all l abels reviewed by humans, a semi - supervised set with 50% human review, and an unsupervised set with only machine - generated labels. We fine - tuned two transformer models on these datasets for both aspect and sentiment classification. Experimental results sho w that our Arabic - specific model achieved high accuracy even with minimal human supervision, reflecting only a minor performance drop when using ChatGPT - only labels. Reducing the number of aspect classes notably improved classification metrics across the b oard. These findings demonstrate an effective, scalable approach to Arabic aspect - based sentiment analysis (SA) in healthcare, combining large language model annotation with human expertise to produce a robust and explainable dataset. Future directions inc lude generalisation across hospitals, prompt refinement, and interpretable data - driven modelling.
- North America > United States (0.68)
- Europe > United Kingdom > England > Tyne and Wear > Newcastle (0.05)
- Europe > Switzerland (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Machine Learning Fairness in House Price Prediction: A Case Study of America's Expanding Metropolises
Almajed, Abdalwahab, Tabar, Maryam, Najafirad, Peyman
As a basic human need, housing plays a key role in enhancing health, well-being, and educational outcome in society, and the housing market is a major factor for promoting quality of life and ensuring social equity. To improve the housing conditions, there has been extensive research on building Machine Learning (ML)-driven house price prediction solutions to accurately forecast the future conditions, and help inform actions and policies in the field. In spite of their success in developing high-accuracy models, there is a gap in our understanding of the extent to which various ML-driven house price prediction approaches show ethnic and/or racial bias, which in turn is essential for the responsible use of ML, and ensuring that the ML-driven solutions do not exacerbate inequity. To fill this gap, this paper develops several ML models from a combination of structural and neighborhood-level attributes, and conducts comprehensive assessments on the fairness of ML models under various definitions of privileged groups. As a result, it finds that the ML-driven house price prediction models show various levels of bias towards protected attributes (i.e., race and ethnicity in this study). Then, it investigates the performance of different bias mitigation solutions, and the experimental results show their various levels of effectiveness on different ML-driven methods. However, in general, the in-processing bias mitigation approach tends to be more effective than the pre-processing one in this problem domain. Our code is available at https://github.com/wahab1412/housing_fairness.
- North America > United States > Texas > Bexar County > San Antonio (0.14)
- North America > United States > Illinois (0.04)
- South America > Uruguay > Maldonado > Maldonado (0.04)
- (7 more...)
Advanced Crash Causation Analysis for Freeway Safety: A Large Language Model Approach to Identifying Key Contributing Factors
Abdelrahman, Ahmed S., Abdel-Aty, Mohamed, Yang, Samgyu, Faden, Abdulrahman
Understanding the factors contributing to traffic crashes and developing strategies to mitigate their severity is essential. Traditional statistical methods and machine learning models often struggle to capture the complex interactions between various factors and the unique characteristics of each crash. This research leverages large language model (LLM) to analyze freeway crash data and provide crash causation analysis accordingly. By compiling 226 traffic safety studies related to freeway crashes, a training dataset encompassing environmental, driver, traffic, and geometric design factors was created. The Llama3 8B model was fine-tuned using QLoRA to enhance its understanding of freeway crashes and their contributing factors, as covered in these studies. The fine-tuned Llama3 8B model was then used to identify crash causation without pre-labeled data through zero-shot classification, providing comprehensive explanations to ensure that the identified causes were reasonable and aligned with existing research. Results demonstrate that LLMs effectively identify primary crash causes such as alcohol-impaired driving, speeding, aggressive driving, and driver inattention. Incorporating event data, such as road maintenance, offers more profound insights. The model's practical applicability and potential to improve traffic safety measures were validated by a high level of agreement among researchers in the field of traffic safety, as reflected in questionnaire results with 88.89%. This research highlights the complex nature of traffic crashes and how LLMs can be used for comprehensive analysis of crash causation and other contributing factors. Moreover, it provides valuable insights and potential countermeasures to aid planners and policymakers in developing more effective and efficient traffic safety practices.
- North America > United States > Florida > Orange County > Orlando (0.14)
- North America > United States > Alabama (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- (2 more...)
- Transportation > Infrastructure & Services (1.00)
- Transportation > Ground > Road (1.00)
- Health & Medicine (1.00)
- Automobiles & Trucks (0.93)
Weaponizing Language Models for Cybersecurity Offensive Operations: Automating Vulnerability Assessment Report Validation; A Review Paper
Almuhaidib, Abdulrahman S, Zain, Azlan Mohd, Zakaria, Zalmiyah, Kamsani, Izyan Izzati, Almuhaidib, Abdulaziz S
This, with the ever - increasing sophistication of cyberwar, calls for novel solutions. In this regard, Large Language Models (LLMs) have emerged as a highly promising tool for defensive and offensive cybersecurity - related strategies. While existing literature has focused much on the defensive use of LLMs, when it comes to their offensive utilization, very little has been reported - name ly, concerning V ulnerability A ssessment (VA) report validation. Consequentially, this paper tries to fill that gap by investigating the capabilities of LLMs in automating and improving the validation process of the report of the VA . From the critical review of the related literature, this paper hereby proposes a new approach to using the LLMs in the automation of the analysis and within the validation process of the report of the VA that could potentially reduce the number of false positives and generally enhance efficiency. These results are promisi ng for LLM automatization for improving validation on reports coming from VA in order to improve accuracy while reducing human effort and security postures. The contribution of this paper provides further evidence about the offensive and defensive LLM capabilities and therefor helps in devising more appropriate cybersecurity strategies and tools accordingly.
- North America > United States (0.28)
- Asia > Middle East > Saudi Arabia > Eastern Province > Dammam (0.04)
- Europe > Ireland (0.04)
- Asia > Malaysia > Johor > Johor Bahru (0.04)
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (1.00)
Artificial Intelligence (AI) Based Prediction of Mortality, for COVID-19 Patients
Tamala, Mahbubunnabi, Rahmanb, Mohammad Marufur, Alhasimc, Maryam, Mulhimd, Mobarak Al, Derichee, Mohamed
For severely affected COVID-19 patients, it is crucial to identify high-risk patients and predict survival and need for intensive care (ICU). Most of the proposed models are not well reported making them less reproducible and prone to high risk of bias particularly in presence of imbalance data/class. In this study, the performances of nine machine and deep learning algorithms in combination with two widely used feature selection methods were investigated to predict last status representing mortality, ICU requirement, and ventilation days. Fivefold cross-validation was used for training and validation purposes. To minimize bias, the training and testing sets were split maintaining similar distributions. Only 10 out of 122 features were found to be useful in prediction modelling with Acute kidney injury during hospitalization feature being the most important one. The algorithms performances depend on feature numbers and data pre-processing techniques. LSTM performs the best in predicting last status and ICU requirement with 90%, 92%, 86% and 95% accuracy, sensitivity, specificity, and AUC respectively. DNN performs the best in predicting Ventilation days with 88% accuracy. Considering all the factors and limitations including absence of exact time point of clinical onset, LSTM with carefully selected features can accurately predict last status and ICU requirement. DNN performs the best in predicting Ventilation days. Appropriate machine learning algorithm with carefully selected features and balance data can accurately predict mortality, ICU requirement and ventilation support. Such model can be very useful in emergency and pandemic where prompt and precise
- Asia > Middle East > Saudi Arabia > Eastern Province > Dammam (0.04)
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- Asia > China > Hubei Province > Wuhan (0.04)
- (5 more...)
- Research Report > Experimental Study (0.68)
- Research Report > New Finding (0.67)
On the Impact of Multi-dimensional Local Differential Privacy on Fairness
Makhlouf, Karima, Arcolezi, Heber H., Zhioua, Sami, Brahim, Ghassen Ben, Palamidessi, Catuscia
Data collected about individuals is regularly used to make decisions that impact those same individuals. For example, census statistics have important implications for all aspects of daily life, including the allocation of political power, the distribution of federal funds, and research in economics and social sciences. In banking industries, machine learning (ML) models leverage data to proactively monitor customer behavior, reduce the likelihood of false positives, and prevent fraud. In these settings, there is a tension between the need for accurate systems, in which individuals receive what they deserve, and the need to protect individuals from improper disclosure of their sensitive information. Differential privacy (DP) [23] is now widely recognized as the gold standard for providing formal guarantees on the privacy level achieved by an algorithm. However, central DP can only be used on the assumption of a trustworthy server. Local DP (LDP) [32] is a variant that achieves privacy guarantees for each user locally with no assumptions on third-party servers. In other words, LDP ensures that each user's data is locally obfuscated first on the client-side and then sent to the server-side, thus protecting data from privacy leaks on both the client-side and the server-side. Many Big tech companies have deployed LDP-based algorithms to use in their industrial products (e.g., Google Chrome [24] and Apple iOS [4]).
- Europe > Switzerland (0.04)
- Europe > Greece > Epirus > Ioannina (0.04)
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
- (3 more...)
Let's Predict Who Will Move to a New Job
Gahar, Rania Mkhinini, Hidri, Adel, Hidri, Minyar Sassi
Any company's human resources department faces the challenge of predicting whether an applicant will search for a new job or stay with the company. In this paper, we discuss how machine learning (ML) is used to predict who will move to a new job. First, the data is pre-processed into a suitable format for ML models. To deal with categorical features, data encoding is applied and several MLA (ML Algorithms) are performed including Random Forest (RF), Logistic Regression (LR), Decision Tree (DT), and eXtreme Gradient Boosting (XGBoost). To improve the performance of ML models, the synthetic minority oversampling technique (SMOTE) is used to retain them. Models are assessed using decision support metrics such as precision, recall, F1-Score, and accuracy.
- Asia > Middle East > Saudi Arabia > Eastern Province > Dammam (0.05)
- Africa > Middle East > Tunisia > Tunis Governorate > Tunis (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)